1048. Longest String Chain
Medium

You are given an array of words where each word consists of lowercase English letters.

wordA is a predecessor of wordB if and only if we can insert exactly one letter anywhere in wordA without changing the order of the other characters to make it equal to wordB.

  • For example, "abc" is a predecessor of "abac", while "cba" is not a predecessor of "bcad".

A word chain is a sequence of words [word1, word2, ..., wordk] with k >= 1, where word1 is a predecessor of word2, word2 is a predecessor of word3, and so on. A single word is trivially a word chain with k == 1.

Return the length of the longest possible word chain with words chosen from the given list of words.

 

Example 1:

Input: words = ["a","b","ba","bca","bda","bdca"]
Output: 4
Explanation: One of the longest word chains is ["a","ba","bda","bdca"].

Example 2:

Input: words = ["xbc","pcxbcf","xb","cxbc","pcxbc"]
Output: 5
Explanation: All the words can be put in a word chain ["xb", "xbc", "cxbc", "pcxbc", "pcxbcf"].

Example 3:

Input: words = ["abcd","dbqca"]
Output: 1
Explanation: The trivial word chain ["abcd"] is one of the longest word chains.
["abcd","dbqca"] is not a valid word chain because the ordering of the letters is changed.

 

Constraints:

  • 1 <= words.length <= 1000
  • 1 <= words[i].length <= 16
  • words[i] only consists of lowercase English letters.
Accepted
123.6K
Submissions
219.8K
Seen this question in a real interview before?
Companies
Show Hint 1
Instead of adding a character, try deleting a character to form a chain in reverse.
Show Hint 2
For each word in order of length, for each word2 which is word with one character removed, length[word2] = max(length[word2], length[word] + 1).

Average Rating: 4.83 (36 votes)

Premium

Solution


Overview

A word chain is a sequence of words (word1 -> word2 -> word3 -> word4 -> word5......) such that word1 is a predecessor of word2 and so on. A key point in the problem statement is that word1 can be a predecessor of word2 if and only if we can add exactly one letter anywhere in word1 to make it equal to word2. In other words, word2 should have one letter more than word1 and the position of this new letter can be anywhere. Note that the order of the words in the list does not need to be maintained while creating the word sequence.

Suppose that word1 is ab then word2 can be ab*, a*b, *ab where * is any lowercase English letter.

Therefore, it is possible for a particular word to have more than one predecessor in the given list, and thus belong to more than one word sequence. Our objective is to determine the length of the longest possible word sequence.

Let us consider the following example : ['abcd','abc','bcd','abd','ab','ad','b'] In this list, the immediate predecessors of abcd are ['abc','bcd','abd'] as all these words are missing exactly one letter from the word abcd. Similarly, the immediate predecessors of abd are ['ab','ad'] and the predecessor of ab is ['b'].



Approach 1: Top-Down Dynamic Programming (Recursion + Memoization)

Intuition

If you're not familiar with DFS (Depth First Search), check out our Explore Card.

Here we work backwards to find the longest chain, this means that we will start from a word and delete one character at a time. We continue this chain until we come across a word that is not present in the list or is one letter long.

In the above example some of the possible word sequences are: abcd -> abd -> ab -> b , abcd -> abc -> ab -> b , abcd -> bcd and so on. The possible word sequences are illustrated in Figure 1.

fig

Figure 1. Figure demonstrating DFS to find the longest word sequence.

In this graph, we can observe that the length of the longest possible word sequence is 4. There are two word sequences that have the longest length : abcd -> abd -> ab -> b and abcd -> abc -> ab -> b. (The longest path is shown in the diagram with red arrows).

Notice that a particular sequence can be a part of more than one word sequence. For example the sequence ab -> b is part of both the following sequences : abcd -> abd -> ab -> b and abcd -> abc -> ab -> b. This leads to repeated calculations because every time we encounter ab we need to explore the subpath ab -> a. For a small list, this is not a problem but as the size of the list increases, the size of the graph grows exponentially.

What we can do is whenever we encounter a new word, we will find all possible sequences with this word as the last word in the sequence. Then, we will store the length of the longest possible sequence that ends with this word.

We will use a map for this where each key will be an ending word and the value will be the length of the longest possible word sequence ending with this word. In the above example when we first encounter the word ab we will store the value 2 (word sequence ab -> b) for key ab. The next time we encounter ab, we will simply return the value stored against it in the map instead of going through the entire subtree again. This process is known as memoization and it prevents recalculation. For every word present in the list, we only need to determine the length of the longest path that ends with this word once.

Algorithm

  1. Initialize a set (wordsPresent) and add all the words in the list to the set. This set will be used to check if a word is present in the list.

  2. Initialize a map (memo) having key type as String and value type as Integer. This map will store the length of the longest possible word sequence where the key is the last word in the sequence.

  3. Iterate over the list. For each word in the list perform a depth-first search.

  4. In the DFS, consider the current word (currentWord) as the last word in the word sequence.

  5. If currentWord was encountered previously we just return its corresponding value in the map memo.

  6. Initialize maxLength to 1.

  7. Iterate over the entire length of the currentWord.

    • Create all possible words (newWord) by taking out one character at a time.
    • If newWord is present in the set perform a DFS with this word and store the intermediate result in a variable currentLength.
    • Update the maxLength so that it contains the length of the longest sequence possible where the currentWord is the end word.
  8. Set the maxLength as the value for currentWord (key) in the map.

  9. Return maxLength.

Implementation

Complexity Analysis

Let NN be the number of words in the list and LL be the maximum possible length of a word.

  • Time complexity: O(L2N)O(L ^ 2 \cdot N).

    Initially, we iterate over the list to store all the given words in a set (adds NN to the complexity).

    Next, we perform a DFS for each word (O(N)O(N)). For each word, we iterate over its length(O(L)O(L)). At each index (i) we create a new word by deleting the character at position i from the original word (O(L)O(L)). Therefore, the overall time complexity is O(N+(L2N))O(N + (L ^ 2 \cdot N)) = O(L2N))O(L ^ 2 \cdot N)), because the NN term is insignificant relative to the L2NL ^ 2 \cdot N term. Note that because of memoization we can be sure that each word in the list is traversed only once.

  • Space complexity: O(N)O(N).

    The extra space is used by the recursion call stack. In worst case all the words are a part of the longest word sequence which requires a recursion stack size of NN.

    Also, we use a set to store all distinct words (size NN) and a map to store intermediate results (size NN). Since the maximum number of distinct words will be NN (when there is no repetition) the overall space complexity is O(2N)O(2 \cdot N) which in Big O notation equals O(N)O(N).



Approach 2: Bottom-Up Dynamic Programming

Intuition

In this solution, we will create the word sequence by adding one letter at a time to the last word in the sequence. Thus the resulting word sequence will be a series of words where each word has one more letter than its predecessor.

If we know the length (previousLength) of the longest word sequence that ends with a word we can use this value to find the length of the longest word sequence for its successor(newLength = previousLength + 1).

Let us again consider the above example ['abcd','abc','bcd','abd','ab','ad','b']. The longest word sequence with the word b is 1. Thus the length of the longest word sequence with the word ab will be 1 + 1 = 2 (ab -> b). This result can in turn be used to find the length of the longest word sequence for the word abc (2 + 1 = 3 for sequence abc -> ab -> b).

The length of the words in a sequence increases as we move from left to right. Also, we know that the order of the words in the list doesn't matter. So we can sort the words in ascending order based on their length. Next, we can iterate over the sorted list and calculate the length of the longest sequence possible where the word at index ii is the end word. We store this result in a map where key is the word and value is the sequence length. By doing this we ensure that, for each word that we encounter, we already know the result of all of its possible predecessors. This process is demonstrated in the following animation.

Current
1 / 9

Algorithm

  1. Initialize a map where key is the word and value is the length of the longest word sequence possible with the key as the end word.

  2. Sort the word list in increasing order of the word length.

  3. Initialize longestWordSequenceLength to 1. This variable holds the length of the longest word sequence possible.

  4. Iterate over the sorted list.

  5. For each word initialize presentLength to 1.

  6. Iterate over the entire length of each word.

    • Delete the character at ithi^{th} position from the current word and assign the new word to the variable predecessor.
    • Check if predecessor is present in the list or not.
    • If the predecessor is present, then assign its mapped value to previousLength. Update the presentLength if previousLength + 1 is greater than the presentLength.
  7. After terminating the inner for loop, assign presentLength to the current word in the map dp.

  8. Update the longestWordSequenceLength if the longest word sequence formed with the current word as the end word is longer than the previously considered word sequence.

  9. After terminating the outer for loop, return longestWordSequenceLength.

Implementation

Complexity Analysis

Let NN be the number of words in the list and LL be the maximum possible length of a word.

  • Time complexity: O(N(logN+L2))O(N \cdot (\log N + L ^ 2)).

    Sorting a list of size NN takes O(NlogN)O(N \log N) time. Next, we use two for loops in which the outer loop runs for O(N)O(N) iterations and the inner loop runs for O(L2)O(L ^ 2) iterations in the worst case scenario. The first LL is for the inner loop and the second LL is for creating each predecessor. Thus the overall time complexity is O(NlogN+(NL2))O(N \log N + (N \cdot L ^ 2)) which equals O(N(logN+L2))O(N \cdot (\log N + L ^ 2)).

  • Space complexity: O(N)O(N).

    We use a map to store the length of the longest sequence formed with each of the NN words as the end word.



Report Article Issue

Comments: 13
Baggins12's avatar
Read More

Am I the only one who feels too dumb after not being able to come up with any solution for this 'medium' problem? :(...

8
Reply
Share
Report
LordPain's avatar
Read More

Neat and clear Explanation! Keep it up! ..Wish the majority of official solution articles were like this..

2
Reply
Share
Report
Mazhar_MIK's avatar
Read More

Very well explained.
Just one more addition in case it makes anyone curious, It can be solved using the concept of LIS(Longest Increasing Subsequence) also provided you sort the input first in ascending order of length.
Link : https://leetcode.com/problems/longest-string-chain/discuss/1213820/C%2B%2B-(Why-it-looks-like-LIS-pattern-(easy-comments))-%3A-Memoized-greaterDP

1
Show 3 replies
Reply
Share
Report
Gorom_pani's avatar
Read More

I just loved the explanation....keep it up

1
Reply
Share
Report
karansdoshi's avatar
Read More

You can expect this question in an interview. There are at least 3 methods of solving this question.

0
Reply
Share
Report
AkaYS's avatar
Read More

can someone enlighten me about the time complexity of substr fn in c++? In my opinion, it does take linear time to fetch u the substring, but I don't see it being counted in time complexity as of the whole. Why is it so?? am I wrong with this concept? If yes plz correct me.

0
Show 2 replies
Reply
Share
Report
Yinyu78's avatar
Read More

Approach 2: word may be searched first from dp before removing any char, helpful if there are any duplicated word in the given words list:

    for (const string &word : words) {
        int presentLength = 0;
        if (dp.find(word) != dp.end()) {
            presentLength  = dp[word];
        }
        else {
            presentLength = 1;
            for (int i = 0; i < word.length(); i++) {
            ...
            }
            dp[word] = presentLength;
        }

0
Reply
Share
Report
sriharik's avatar
Read More

I'm curious to know how you decided to approach the problem by deleting characters and going backwards, I spent a very very long time trying to add characters.

0
Show 1 reply
Reply
Share
Report
Binga45's avatar
Read More

I think the complexity for the 1st approach would be N^2 * L^2. Consider the max length of sequence be half of the words, i.e.., N/2. Now we are performing dfs on each word so outer for loop costs O(N). For the reasons mentioned in the solution, there is L^2 complexity in each dfs loop, but we are also recurring again at maximum till depth N/2 in this example, if the next word is present in the hashset which would cost additional O(N/2). so the complexity for this case will be O(N * N/2 *(L^2)).

0
Reply
Share
Report
Goldspear's avatar
Read More

Nice article!

One further optimisation for the bottom-up DP solution: hash the words by their length and skip the search, if for a group of words of length L, there's no L - 1 words exist (O(1) look-up with the extra hash table). This reduces the time complexity to O(U*logU + N*L^2) where U <= N is the number of unique word length values.

The following code runs below 95ms and faster than 99% submissions.

def longestStrChain(self, words: List[str]) -> int:
        from collections import defaultdict
        # dp represents longest chain length with the key word being the end of the chain
        dp: Dict[str, int] = {} 
        len_hashed = defaultdict(set)
        for w in words:
            len_hashed[len(w)].add(w)
            dp[w] = 1
        
        res = 1
        # Optimisation1 - allows faster sorting if there are words of the same length
        for i in sorted(len_hashed.keys()):
            # Optimisation 2 - allows skipping multiple words and their substring search if i - 1 length words don't exist
            if i - 1 not in len_hashed:
                continue
            for w in len_hashed[i]:
                for j in range(i):
                    temp = w[:j] + w[j + 1:]
                    if temp in dp:
                        dp[w] = max(dp[w], dp[temp] + 1)
                        res = max(dp[w], res)
        return res

0
Reply
Share
Report
Time Submitted
Status
Runtime
Memory
Language
05/17/2021 13:02Accepted84 ms17.3 MBcpp
04/05/2021 12:39Accepted84 ms17 MBcpp
04/05/2021 12:35Compile ErrorN/AN/Acpp
/23
Contribute
|||
Saved
Trie
This list is empty.